Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element.

نویسنده

  • Akihiko Koga
چکیده

It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. The present study provides the first quantitative example (in the case of 16 copies of virtually identical, 4.7-kb sequences in a genome of 7 × 10 (8) bp) by comparing the results of BLAST searches of a sequence database (contig N50; 9.8 kb) with those of Southern blot analysis of genomic DNA. This has revealed that the internal regions of the repetitive sequences are under-represented to a striking extent.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium.

The DNA content of eukaryotic nuclei (C-value) varies approximately 200,000-fold, but there is only a approximately 20-fold variation in the number of protein-coding genes. Hence, most C-value variation is ascribed to the repetitive fraction, although little is known about the evolutionary dynamics of the specific components that lead to genome size variation. To understand the modes and mechan...

متن کامل

Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L.) Genome

Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, asses...

متن کامل

McClintock: An Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data

Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioi...

متن کامل

Comparing the whole-genome-shotgun and map-based sequences of the rice genome.

The rice genome has now been sequenced using whole-genome-shotgun and map-based methods. The relative merits of the two methods are the subject of debate, as they were in the human genome project. In this Opinion article, we will show that the serious discrepancies between the resultant sequences are mostly found in the large transposable elements such as copia and gypsy that populate the inter...

متن کامل

RJPrimers: unique transposable element insertion junction discovery and PCR primer design for marker development

Transposable elements (TE) exist in the genomes of nearly all eukaryotes. TE mobilization through 'cut-and-paste' or 'copy-and-paste' mechanisms causes their insertions into other repetitive sequences, gene loci and other DNA. An insertion of a TE commonly creates a unique TE junction in the genome. TE junctions are also randomly distributed along chromosomes and therefore useful for genome-wid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome

دوره 55 2  شماره 

صفحات  -

تاریخ انتشار 2012